Balanced Random Survival Forests for Extremely Unbalanced, Right Censored Data
نویسندگان
چکیده
Accuracies of survival models for life expectancy prediction as well as lifesaving criticalcare applications are significantly compromised due to the sparsity of samples and extreme imbalance between the survival and mortality classes in addition to the invalidity of the popular proportional hazard assumption. An imbalance in data results in an underestimation (overestimation) of the hazard of the mortality (survival) classes. Balanced random survival forests (BRSF) model, based on training random survival forests with balanced data generated from a synthetic minority sampling scheme is presented to address this gap. Theoretical findings on the improvement of survival prediction after balancing are corroborated using extensive empirical evaluations. Benchmarking studies consider five data sets of different levels of class imbalance from public repositories and an imbalanced survival data set of 267 ST-elevated myocardial infarction (STEMI) patients collected over a period of one year at Heart, Artery, and Vein Center of Fresno, CA. Investigations suggest BRSF provides a better discriminatory strength between the censored and the mortality classes and improves survival prediction of the minority. BRSF outperformed both optimized Cox (without and with balancing) and RSF with a 55% reduction (averaged over all 6 data sets) in prediction error over the next best alternative.
منابع مشابه
Random Survival Forests
We introduce random survival forests, a random forests method for the analysis of right-censored survival data. New survival splitting rules for growing survival trees are introduced, as is a new missing data algorithm for imputing missing data. A conservation-of-events principle for survival forests is introduced and used to define ensemble mortality, a simple interpretable measure of mortalit...
متن کاملRandom Survival Forests 1
We introduce random survival forests, a random forests method for the analysis of right-censored survival data. New survival splitting rules for growing survival trees are introduced, as is a new missing data algorithm for imputing missing data. A conservation-of-events principle for survival forests is introduced and used to define ensemble mortality, a simple interpretable measure of mortalit...
متن کاملStrong Convergence Rates of the Product-limit Estimator for Left Truncated and Right Censored Data under Association
Non-parametric estimation of a survival function from left truncated data subject to right censoring has been extensively studied in the literature. It is commonly assumed in such studies that the lifetime variables are a sample of independent and identically distributed random variables from the target population. This assumption is often prone to failure in practical studies. For instance, wh...
متن کاملConsistency of Random Survival Forests.
We prove uniform consistency of Random Survival Forests (RSF), a newly introduced forest ensemble learner for analysis of right-censored survival data. Consistency is proven under general splitting rules, bootstrapping, and random selection of variables-that is, under true implementation of the methodology. Under this setting we show that the forest ensemble survival function converges uniforml...
متن کاملRotation survival forest for right censored data
Recently, survival ensembles have found more and more applications in biological and medical research when censored time-to-event data are often confronted. In this research, we investigate the plausibility of extending a rotation forest, originally proposed for classification purpose, to survival analysis. Supported by the proper statistical analysis, we show that rotation survival forests are...
متن کامل